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Abstract 


The purpose of this research study is two-fold. First to evaluate and compare the 
innovativeness of Greece relative to the European Union using indicators from the European 
innovation scoreboard and second to propose practices and techniques concerning the 
utilization of machine learning for modeling and analyzing innovation in general. Systematic 
analysis is conducted regarding the over-performance and the under-performance of Greece 
and the trends of these indicators over the years through statistical techniques and methods. 
Machine learning and advanced statistical methods are incorporated to ascertain the most 
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1. Introduction 


Innovation is one of the two fundamental functions of an organization (Drucker, 1954). Itis the procedure of translating 
an idea or invention into a good or service that creates value or for which customers will pay. The idea must be able to 
be replicated at an economical cost and must satisfy a specific need. Innovation implicates the intentional application of 
information, creativity, and lead in deriving greater or different values from resources (Henderson and Lentz, 1995). It 
involves all processes by which new ideas are generated and converted into useful products. It also includes the 
developing of new sources of supply with raw materials (Schumpeter and Opie, 1934). In business, innovation is the 
outcome of applied ideas by the company to further satisfy the requirements and expectations of the customers. 


Innovation may also be characterized as a process that provides company, suppliers and consumers with added 
value and a degree of novelty, creating new processes, products, technologies, services and new forms of marketing. 
Also, it is the adoption of new or significantly improved elements to create added value to the organization directly or 
indirectly for its customers (Carnegie, Roderick and Business Council of Australia, 1993). In a social context, innovation 
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aids in the development of new methods for alliance creation, joint venturing, flexible work hours, and the creation of 
buyers’ purchasing power. It is synonymous with risk-taking. Organizations and companies that develop new revolutionary 
products and services take on the great risk because they create new markets. 


Innovation’s role in market development and coordination is an intrinsic one. The value of innovative technologies, 
including product development, management practices, ways of doing work and beyond, is critical in all human fields 
(Tohidi and Jabbari, 2012). 


On the other hand, the industry is crucial for competitiveness and innovation is a key factor in this regard (Industrial 
policy | Internal Market, Industry, Entrepreneurship and SMEs). The industry commonly accounts for around 80% of a 
country’s exports. Some 65% of research and development (R&D) investment in the private sector comes from 
manufacturing. Industrial modernization must therefore be broad in every country and include: the successful marketing 
of product and service innovations, the industrial exploitation of innovative manufacturing technologies and innovative 
business models. 


Organizations that prioritize innovation are likewise the individuals with the most astonishing increase in turnover. 
About 79% of companies that have introduced at least one innovation since 2011 experienced an increase of more than 
25% in their turnover by 2014 (Innobarometer | Internal Market, Industry, Entrepreneurship and SMEs). 


Specifically, innovation policy targets to a high extent the Small and Medium-sized Enterprises (SMEs). Small 
companies face constraints to innovation or the commercialization of its innovations. About 63% of firms with fewer 
than 9 employees reported having introduced at least one innovation since 2011, compared to 85% of firms with 500 or 
more employees. Some 71% of companies with fewer than nine employees found it difficult to commercialize their 
innovations due to a lack of financial resources, compared to 48% of companies with 500 or more employees (Innovation 
| Internal Market, Industry, Entrepreneurship and SMEs). 


This study concentrates on the European Union (EU) and Greece. The European Commission offers numerous tools 
designed to map, monitor and evaluate the performance of the EU in various areas of innovation. The information 
provided helps EU, national and regional policy makers and practitioners to evaluate their performance and policies and 
learn about new patterns and potential market opportunities that can guide policy making based on evidence (Monitoring 
innovation Internal Market, Industry, Entrepreneurship and SMEs). 


In this paper, the country level innovation performance is examined by analyzing the data from the European 
Innovation Scoreboard (EIS), specifically the version of 2018. Indicators form the basis of the annual EIS which provides 
a benchmarking of the research and innovation operation of the EU Member States. It specifically offers comparisons on 
relative strengths and limitations regarding the research and innovation systems in the country level. It effectively 
offers assistance to Member States in identifying areas on which they need to concentrate their efforts to improve their 
success on innovation. 


The main indicator is the Summary Innovation Index that summarizes the range of different indicators of innovation 
and measures the total innovation performance. For detailed information on the definitions, the explanations and the 
methods of calculation for above mentioned indicators the reader is referred to (European Commission, 2018). 


The specific purpose of this paper is to compare innovativeness of Greece versus the EU average using the indicators 
provided by the EIS for the period 2010-2017. A brief analysis on the systematic over-performance and under-performance 
of Greece and the trends of these indicators over the years is provided. Furthermore, machine learning and statistical 
techniques are used to identify the key features that influence the fluctuation of the summary innovation score at the EU 
and Greece level. 


The rest of this paper is organized as follows. Next, selected studies that are related to our efforts are presented. 
Section 3 contains our methodologies concerning the data collection, the pre-processing of the data, its statistical 
analysis, and the machine learning process. Section 4 provides information about our indicators and 5 the predictive 
trend and machine learning analysis of our data. Section 6 summarizes our results concerning the performance of Greece 
against the EU average and presents our concluding remarks. Links to detailed data and further supporting information 
are provided in Appendix. 


2. Related work 


As innovation analysis is considered fundamental and one of the most important keys for the economic growth of each 
country, many researchers from different fields focus their efforts on studying various aspects of innovation. During the 
past few decades, a significant amount of research work has been conducted using advanced tools and indicators 
examining innovation. 
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SMEs consist a principal part of the industry and economy in all modern countries. In 1990, despite the fact that only 
little was known about SME innovation activities, Hyvarinen Lisa reviewed the definitions of innovation technology 
and factors on the background of innovation activities of SMEs. In the aforementioned research work, various concepts 
approaching innovativeness of SMEs and their contribution to total innovation are explained (Hyvarinen, 1990). 


There exists a rather limited amount of scientific work, to be presented below, regarding the Greek innovativeness 
and economic performance. Researchers have used empirical analysis to show the unfriendliness of the Greek private 
sector to invest in R&D and the low productivity of innovation (Beneki e¢ a/., 2012). Others investigate the impact of the 
indicator R&D activity on operational performance of SMEs extending the objective on the operational performance of 
SMEs in the small, open Greek economy (Eleftherios ef al., 2016). 


Additional studies focus on the significance and awareness of a set of established strategic influences of technological 
innovation in the context of European newly-industrialized countries. Research studies such as (Souitaris, 2001), provided 
evidence from interviews conducted on Greek manufacturing firms (mainly SMEs) measuring their innovation rate as 
well as key performance indicators. Using statistical analysis tools, they summarize and highlight the most important 
indicators that have major importance influence of innovation. This study also indicates that the Greek institutional 
context had insufficient important influences of innovation and the highly innovative companies were the ones to 
overcome barriers such as the low supply of technology and other innovation obstacles. 


Moreover, such studies developed also at regional level and provide evaluation of the numerous policy instruments 
used by regional governments in Europe to promote innovation activity in SMEs (Asheim ef a/., 2003). Scientists try to 
find patterns of innovation in regional innovation structures which are becoming increasingly diverse, complex and 
nonlinear. To address these issues, they use multi-output models (Hajek and Henriques, 2017). 


There is a variation in methods utilized by researchers in order to forecast or analyze in depth innovation or 
specifically indicators of innovation. A wide variety of machine learning and deep learning algorithms is commonly used. 
Advanced machine learning methods, such as ensemble decision trees, are utilized in (Hajek and Stejskal, 2015). Where 
they demonstrate the use of ensembles of decision trees to model the intrinsic nonlinear characteristics of the innovation 
process and apply their method for predicting innovation activity to chemical companies. In addition, other studies use 
non-linear methods based on Artificial Intelligence, namely neural networks (Paz-Marin ef al., 2012; Hajek and Henriques, 
2017; Chien et al., 2010; Wang and Chien, 2006; Saberi and Yusuff, 2012). In the aforementioned study (Wang and Chien, 
2006), they forecast innovation performance using a neural network model with fuzzy rules and provide evidence from 
Taiwanese manufacturing industry. They also implement an adaptive neuro-fuzzy inference system to measure the 
innovation performance through technical information resources and innovation objectives. In (Saberi and Yusuff, 
2012), they develop an Artificial Neural Network classification method and prediction model that can assist companies 
especially SMEs in evaluating Advanced Manufacturing Technology implementation contributing to innovation. 


In addition, the fact that decision makers often group the object of their analysis into homogeneous classes lead 
them to use clustering algorithms, see for example (Paz-Marin ef al., 2012; Klimova ef al., 2016; Roszko-Wojtowicz and 
BiaBek, 2018; Saberi and Yusuff, 2012). Others prefer more traditional techniques based on statistical analysis and 
equation modeling (Kalapouti ef al., 2017; Jan van den Ende and Timo van Balen, 2017). 


Please note that there exist a variety of data utilized that may come from a single source or by combining multiple data 
sources. They might rely on traditional methods like interviews (Souitaris, 2001) and well-formed databases obtained 
from Eurostat’s official website (Kalapouti et al., 2017; Roszko-W6jtowicz and BiaBek, 2018; Roszko-W6jtowicz and 
BiaBek, 2017), World Bank Database, SCImago Journal (Paz-Marin ef a/., 2012). or data obtained from companies 
providing business services, such as ICAP (Beneki ef a/., 2012). 


We should highlight the research from Rotterdam School of Management regarding the innovativeness of the 
Netherlands compared to EU countries (Jan van den Ende and Timo van Balen, 2017). They statistically compared the 
Netherlands versus EU in indicators of innovation using data from EIS database. Briefly, the methods they utilized are 
generalized least squares regression for trend estimation and statistics. 


This study concerns a review of several of those methods described above, the implementation of the associated 
statistical techniques and machine learning models and their application on indicators time series data. It assesses 
systematic over-performance and under-performance of Greece relative to EU countries and compares the trends of 
Greece and EU regarding the composite and simple indicators of innovation using data from EIS database. Moreover, it 
presents the application of machine learning models to identify, in a model based sense, the most crucial features- 
indicators influencing the volatility of summary innovation index. 
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It is worth to point out that (Jan van den Ende and Timo van Balen, 2017) is the study most closely related to ours. 
They tackle the same problem of analyzing innovativeness in the country-level through a very interesting, innovative 
and effective methodology. This paper is inspired from their analysis of the Netherlands relative to the EU regarding 
innovation performance. Besides, we consider a similar approach to analyze the case of Greece. Furthermore, it adds 
scientific value by involving machine learning to estimate the most important indicators, combining more information on 
this study. Besides, further analysis of indicators utilizing principal components analysis and advanced data visualization 
methods emphasizes the importance of this study. 


3. Analytics work-flow and overall methodology 


The basic elements of our overall methodology for our study are presented. The related architectural structure of our 
analytics workflow, which consists of the following three main stages, is graphically depicted in Figure 1. 
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Figure 1: Analytics work-flow and methodology 


3.1. First stage — pre-processing 


Data is acquired from the 2018 database edition of the EIS, containing Greece and the EU average for the period 2010- 
2017. It is collected from EIS website and it is of high quality with minor missing observations for Greece. The EIS 2018 
database comprises of many dimensions. This study utilizes composite indicators and each individual indicator for the 
years 2010-2017. 


EIS 2018 database is processed to choose only indicators and composite indicators regarding the EU and Greece. 
The data is preprocessed and the two indicators (“Foreign doctorate students as a % of all doctorate students” and 
“Employment in fast-growing enterprises (% of total employment)”) are dropped from our analysis, because of missing 
values for Greece. A time series data from 2010 to 2017 is built using the standardized scores given by the database, for 
each indicator and composite indicator. In total, time series data comprising of 25 indicators and 11 composite indicators 
(including the summary innovation index) are collected. A short description of these indicators, together with their long 
acronyms used in the presentation of our study is given in 6 below, while their details are available at (European 
Commission, 2018; Jan van den Ende and Timo van Balen, 2017). 
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3.2. Second stage — statistical analysis 


The second stage consists of statistical analysis of the aforementioned time series data. Each indicator and composite 
indicator of Greece versus the EU average over the time frame (2010-2017) is visualized. The visualization process 
includes two graphs per indicator (or composite indicator), the actual values of time series and the percentage change 
each year. Then we comment on the graphs and provide statistical test to measure the systematic out-performance or 
under-performance of Greece compared to the EU. These comparisons are concluded from a two samples t,,, assuming 
unequal variances between the (non-missing) data observations over the studied period. This method is commonly 
known as Welch’s t, (Welch, 1947). A hypothesis H, is stated to test whether Greece outperforms on average the EU. 
For the aforementioned test, we accept or reject the H,, according to the tstat , the test statistic of the test, and p 
probability value, is the likelihood of finding the observed, or more extreme results, when the study question’s null 
hypothesis (His true. The ¢,,., p,,,,,,. and our decision based on an alpha level a= 0.05 are reported, i.e., 95% statistical 
significance. Since H, is a one-tailed test, it is rejected when p,_,,/2 < 0.05. 


value? or 


In addition, the trend-lines (assumed to be linear) statistically analyzed for each indicator (or composite indicator) of 
Greece and of the EU using generalized least squares regression with a correction for autocorrelation on the years in 
Greece and EU level. The generalized least-squares (GLS) regression method with auto-regressive errors (AR(1)), namely 
GLSAR (McKinney ef a/., 2011), is used for linear regression lines. Also, the slope coefficient of the trend-line (b), its 
standard error (se) and the level of significance of the trend (p_, ) are reported respectively. Then, the trend-lines of 
Greece and the EU in a statistical manner using az, are compared, following (Paternoster e7 al., 1998) approach. 


value 


The motivation and the details of our approach for comparing two trend-lines (linear regression lines), are given in 
(Paternoster ef al., 1998). Itis again a hypothesis testing, where H - b ,= b,, Le., D = b, = 0 and the alternative hypothesis 
H,: b, #b,, i.e., b, — b2 = 0. An alpha level of statistical significance equal to a = 0.05 is used, i.e., 95% statistical 
significance. Thus, insignificance is concluded for p,_,,, > 0.05 (two-tailed test). Again, the slope coefficient of the trend- 
line (5), its standard error (se), the z_, and the level of significance of the difference in trend (p,_,,,) are reported 
respectively. 


value 


value 


3.3. Third stage — machine learning 


The third stage includes the machine learning part of our study. We use the following approach in order to estimate the 
contribution of each indicator to the final performance of Greece and EU innovation, i.e., to what degree it affects Greece 
or EU overview innovation indexes. 


The analysis of importance is conducted only in simple indicators, because we want to study the basis of the 
problem rather than using the composite indicators, which contain groups of simple indicators. Firstly, a correlation 
analysis of the indicators is carried out to avoid including the strongly correlated features in our further analysis, 
providing the heat-maps for interpretation purposes. Highly correlated features are dropped from our analysis, i.e., 
correlation coefficient above 0.90 (r > 0.90). This is a common tactic to avoid multi-collinearity issues and get better 
results from model based feature importance and generally in machine learning algorithms. 


In addition, the relationship of the indicator correlation matrices of Greece, the EU and the Netherlands is examined 
utilizing an advanced statistical test, the Mantel test (Mantel, 1967) and the Frobenius norm of the matrices. Mantel test 
is a non-parametric statistical test and computes the statistical significance of the correlation through permutations of 
the rows and columns of one of the input matrices. The test statistic is the Pearson correlation coefficient r. Specifically, 
a two-sided Mantel test is utilized with 10000 permutations to identify the correlation between the two matrices. Please 
recall, that the Frobenius norm, commonly known as the Euclidean norm, is a matrix norm of an m X n matrix A defined by 


m y | 
a:. 
i=l Lond fall Y 


Then, a vector (v) is constructed that models as follows the fluctuation of the summary innovation index over the 


2 


time period (2010-2017). A sliding window that covers a single year over the time frame starting from the beginning is 
implemented. If the value of the summary innovation index in currently considered year is higher than the value for the 
previous year we set v, = 1, otherwise we set v.= 0. Thus, it is a binary classification problem with the indicators as 
features and the vector v as target-label. A 3-fold cross validation is used on our data to train four machine learning 
models for classification. The models are Logistic Regression, Random Forest, Extra-Trees and Support Vector Machines. 
For each model, the estimates of feature importance across all three folds of cross validation are averaged to get a better 
estimate of model based feature importance. In order to get a final summary of the most important features for Greece and 
EU, each of the aforementioned model based feature importance is expressed in percentage feature importance for each 
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model and then we average on percentages across on four models. For homogeneity and averaging purposes, our 
choice is to express feature importance in percentage values, because the procedure and values for calculating the most 
important features are different in each model. Finally, we discuss the most important indicators which drive and affect 
the most the fluctuation of summary innovation index at Greece and EU level. 


4. Data analysis 


The visualizations of time series data and percentage change regarding the years 2010-2017 are provided for each 
composite indicator and indicator used in this study were presented in Appendix. 


4.1. Composite indicators 


Table | displays inferential statistics on the composite indicators time series data. For presentation purposes, all values 
are rounded to certain significant digits. 


The summary innovation index of EU is higher than Greece over the whole period of our study. EU outperforms 
Greece in total innovation. Observing the percentage change graph of this composite indicator in Figure 2, there is a 12% 
decrease of innovation in Greece level on year 2014 compared to previous year. However both EU and Greece show 
upward trend in summary innovation index from 2014 till now. 


It is worth to highlight the composite indicator “Innovators” in Figure 2, where Greece outperforms the EU average 
in all years from 2010 to 2017. We point out that this composite indicator is rather important as it is comprised of three 
simple indicators. In a period of Greece’s economic crisis, there is a systematic over-performance on average of Greece 
versus the EU in the share of companies that have produced developments on the market or within their organizations, 
including innovative processes and products, marketing and organizational innovators, and innovative SMEs in-house. 


Following, there is a statistical hypothesis testing of the systematic over- or under-performance of Greece compared 
to the EU average. A hypothesis is stated for testing. The mean of the EU and Greece values is denoted yw, and H,, 
respectively. We state the null hypothesis H,: 1, H,, and the alternative hypothesis H,: y,, < H,,» With a two samples 
statistical 4, assuming unequal variances, we test whether the EU indicator has greater value in average than Greece 
indicator. In this way, a test is performed for over-performance or under-performance of Greece compared to the EU 
average. Table 1 contains the composite indicators, the ¢,,, of the t,,, the p,_,,, and our decision on H, based on 95% 
statistical significance (a@= 0.05). 


test? 


As we can clearly see, we have strong evidence to reject the null hypothesis H, ( p,_,,,/2 < 0.05), that the EU average 
value is higher than Greece value. This means, that we statistically confirm the systematic over-performance of Greece 
versus EU in the composite indicator “innovators”. 
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Figure 2: On the top: Summary innovation index (a) and its percentage change (b). On the bottom: Innovators (c) 
and its percentage change (d). 


Thanasis Zoumpekas et al. / IntJ.Data.Sci. and Big Data Anal. 1(1) (2021) 20-42 


Page 26 of 42 


b) 


0 


-2 


Percentage (%) 


-12 


2010 


() 


0.7 


0.65 


0.6 


Normalised Score 


0.5 


2010 


=| ~ i 
2011 2012 
2012 


Percentage Change 2010-2017 
Summary Innovation Index 


f=5 GR 
Mm EU 


2013 2014 2015 2016 2017 
Year 
Innovators 
—e— Greece 
—e EU 
2014 2016 
Year 


Figure 2 (Cont.) 


Thanasis Zoumpekas et al. / IntJ.Data.Sci. and Big Data Anal. 1(1) (2021) 20-42 Page 27 of 42 


(d) Percentage Change 2010-2017 
Innovators 


aa Ge GR 


Ga EU 


—10 


Percentage (%) 


2010 2011 2012 2013 2014 2015 2016 2017 
Year 


Figure 2 (Cont.) 


4.2. Indicators 


The Table 1 displays inferential statistics on the indicators time series data. All values in this table are rounded to 3 
decimal values. 


Following, there is a focus on indicators where Greece outperforms the EU with strong statistical evidence, according 
to Table 1. As clearly seen, Greece exceed the EU in average in six indicators, namely “Innovative SMEs collaborating 
with others (% of SMEs)”, “Non-R&D innovation expenditures (% of turnover)’, “Sales of new-to-market and new-to- 
firm innovations as % of turnover’, “International scientific co-publications per million population”, “SMEs introducing 
marketing or organizational innovations as % of SMEs” and “Percentage population aged 25-34 having completed 
tertiary education”. The later three of them are visualized in Figure 3. 


The statistical hypothesis testing is displayed in Table 1, which shows the indicators, thet, of thet, 
our decision on H, based on 95% statistical significance (a@= 0.05). For the effectiveness of the presentation of our data 
analysis the names of the indicators are truncated into long acronyms and are presented in Appendix. 


thep and 


value 


Table 1: Index of Table: Composite Indicators (the first eleven rows), Indicators (the rest). First three columns: t¢- 
test on H,. Fourth column: Performance of Greece (GR) vs European Union (EU) ( *, : Lower, 4° : Higher). Fifth 
and sixth column: Trend of GR and of the EU (+ : Positive, — : Negative). Last two columns: Correlation analysis 
decision on indicators of GR and EU (X : Drop, / : Keep) 


t-test on H, Trend Correlate 
Performance 
p-Value H, GR EU GR EU 

Summary_Innovation_I 25.49 .000 Yi wy = 

Human_Resources 10.14 .000 ¥ a + 

Research_Systems 4.34 .001 Jf wy + + 
Innovation-friendly_env 14.44 .000 A ay 

Finance_and_support 15.45 .000 Jv ay + 

Firm_investments 14.94 .000 ef & = 
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t-test on H, Trend Correlate 
Performance 
p-Value H, GR EU GR EU 

Innovators -2.98 013 x ¥ 

Linkages 7.20 .000 rd a 

Intellectual_assets 23.95 .000 WA a + + 
Employment_impacts 14.61 .000 JY ay 

Sales_impacts 5.34 001 ve a - - 
Broadband_penetration 8.25 -000 of a Vv i 
Venture_capital 12.58 .000 v4 % - - Jf J 
Design_applications 29.72 .000 Vv 4 + + J v 
Trademark_apps 7.91 .000 v 4 + + x i 
Employment_activities 8.43 .000 V4 a x x 
Enterprises_training 8.05 .000 rd 4 ov J 
Innovative_Smes -6.44 .000 x ¥ v i 
International_publications -2.36 034 x x + + x x 
Knowledge_exports 5.83 .001 Jv wy - - JY x 
New_doctorate_grads 8.00 .000 Jv a PA x 
Non_rd -2.85 .013 x ¥ v x 
Opportunity_enterpre 27.90 .000 f a J 
Pct_patent 65.42 .000 J x + A J 
Percentage_tertiary_edu x ¥ F + x x 
Percentage_lifelong lea A a + + x Jv 
Private_co_funding Jv a v v 
Public_private_pubs A x Jv Jv 
Rd_ business v x pe x x 
Rd_public Pi a + + x Jv 
Sales x ¥ = = x J 
Scientific_pubs v x x x 
Smes_in_house Jv x x J 
Smes_marketing x ¥s = - x x 
Smes_product v x x x 
Exports_technology J aX J v 
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Figure 3: On the top: International scientific copublications per million population (a) and its percentage change (b) 
On the middle: SMEs introducing marketing or organizational innovations as% of SMEs (c) and its percentage 
change (d). On the bottom: Percentage population aged 25-34 having completed tertiary education (e) and its 
percentage change (f) 
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5. Predictive analytics 


5.1. Trend analysis of indicators 


This section provides trend analysis for indicators of Greece and the EU average. In addition, the two trend-lines are 
statistically compared. In Table 2 trend-line statistics of composite indicators for Greece and the EU are observed, 


indicating the upward or downward trends. All values in this table are rounded to 3 (2 in the case of z 


decimal digits. 


) significant 


‘score 
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Table 2: Trend-line statistics - Greece and European Union and difference between them. With gray color, we denote 
statistical significance at 95% level (a = 0:05) 
enadine Greece Difference 
b se b se bdi f f se z-score | p-value 
Summary_Innovation_I -.002 -003 575 -.007 -003 -2.23 
Human_Resources O11 002 002 -.003 -003 -1.09 .139 
Research_Systems O11 001 001 .004 .002 2.25 
Innovation-friendly_env .005 .005 .006 -.013 .008 -1.62 -053 
Finance_and_support .019 .004 .006 .004 .007 .520 .302 
Firm_investments 002 -003 -003 -.010 004 -2.49 
Innovators -.025 .010 .062 -.017 .004 -.008 O11 -.717 
Linkages 001 .004 .785 003 .004 419 -.002 .005 -.386 350 
Intellectual_assets .016 002 001 121 018 002 9.43 
Employment_impacts .009 .005 149 -.001 .002 727 .010 .006 1.73 
Sales_impacts -.045 .010 .010 -5.18 
Broadband_penetration .010 .006 144 .035 .010 -2.39 
Venture_capital -.007 .002 .035 .013 -3.14 
Design_applications .015 .002 -.003 .002 8.43 
Trademark_apps .030 .003 .005 .003 7.09 
Employment_activities .009 .005 .006 -005 506 
Enterprises_training -.010 .010 .016 O11 -2.35 
Innovative_Smes -010 008 010 012 -030 
International_public .015 001 .013 001 2.04 
Knowledge_exports -.042 .004 .006 .004 -11.88 
New_doctorate_grads -004 003 .031 007 -4.01 
Non_rd .005 .007 498 014 .010 -.807 210 
Opportunity_enterpre -.001 .005 896 -.000 .006 -.062 475 
Pct_patent .003 001 .069 | -.006 .002 4.75 
Percentage_tertiary_edu .019 .005 2.15 
Percentage_lifelong_lea .009 .002 3.70 
Private_co_funding -.003 -008 -.247 403 
Public_private_pubs -.004 .004 -1.18 118 
Rd_business O11 -003 
Rd_public .046 .008 -.004 .008 
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Table 2 (Cont.) 
enacine Greece European Union Difference 
b se pvalue b se pvalue | bdif f se z-score | p-value 
Sales -.100 .030 .006 809 -.098 031 -3.20 oo 
Scientific_pubs .006 .003 .003 .003 1.09 139 
Smes_in_house -.012 014 001 015 .038 485 
Smes_marketing -.044 007 -.024 .008 -3.10 | on | 
Smes_product -.018 015 -.000 015 -.016 494 
Exports_technology .006 .005 -.008 .006 -1.37 .086 


We start with the composite indicators and we highlight the summary of innovation score. The observable trends in 
Summary innovation index in aforementioned tables are positive for the EU, but negative for Greece. The score of Greece 
decreases by factor of 0.2% per year, while the score of the EU increases 0.5% per year. However, the slope of Greece’s 
trend-line is not significant at 95% statistical significance (p,__,,, = 0.575), while EU is statistical significant (p,_,,, = 0.011). 
Please note that the difference between the trend-lines of Greece and EU is calculated as described in three using a 
statistical z,,_. The difference of Greece and EU trend-lines in summary innovation index is statistically significant. 


value 


Let us focus next on statistically significant trends (p, ,,, < 0.05). For Greece, there is upward trend in human 
resources, research systems, finance and support, intellectual assets, and sale impacts. The first two exhibit an increment 
by a factor of 1.1% every year, while finance and support and intellectual assets show 1.9% and 1.6% respectively. Sale 
impacts of Greece decrease by a factor of 4.5% every year. 


EU shows upward trend in summary innovation index, human resources, research systems, innovation-friendly 
environment, firm investments and innovators. EU human resources and firm investments show an increment of 1.4% 
and 1.3% every year respectively. Research systems and innovation-friendly environment display increase in score of 
0.7% and 1.7% every year respectively. However, innovators in EU average present a decrease by 1.7%. 


The trends of Greece’s composite indicators versus EU are compared in a statistical manner in Table 2, according to 
the methodology described above. Significant difference is observed in trend-lines between EU and Greece in summary 
innovation index, research systems, firm investments, intellectual assets, employment impacts and sales impacts. We 
report the slope difference b,,,,, the standard error se, the z,_, of the statistical test and the p,_,, in Table 2. 


diff? ‘score alue 
Next we move to simple indicators. The upward or downward trends are indicated in both trend-lines of Greece and 
EU and then the two trend-lines are statistically compared. We concentrate again on indicators with p< 0.05, 


indicated by the gray shaded values in tables. 


value 


Greece’s R&D expenditure in the public sector and Trademark applications are growing in a fast pace. However, Sales 
of new-to-market and new-to-firm innovations of Greece are rapidly decreasing. 


The rate of change in indicators of EU is characterized more stable than of Greece. This is expected since EU indicator 
scores are average values of EU countries. However we would like to highlight the fast pace of increment in broadband 
penetration, venture capitals and new doctorate graduates. EU SMEs introducing product or process innovations is 
decreasing by a factor of 2.0% year by year. 


5.2. Indicator importance 


This section presents and analyzes the results of our methodology obtained through machine learning techniques 
concerning the importance of the indicators. 


For visualization and interpretation purposes, heat-maps are utilized for pairwise correlation analysis. Figures 4 and 
5 show correlation heat-maps for the EU and Greece indicators respectively. 


Furthermore, a Mantel test is used to statistically investigate the correlation between our two correlation matrices, 
following the methodology presented in section 3 above. The correlation coefficient of these two matrices is r = 0.08 
while p,__,,, = 0.13. This specific value of the coefficient r shows that there is no correlation between these two matrices, 
i.e., between the correlations of indicators of Greece and the EU. However, the results are not statistical significant since 
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the p,__,,. > 0.05 in a two-sided test. Thus, we are unable to come toa statistical significant conclusion from the comparison 
of these two tables which seem not to be correlated. This observation might require further elucidation and commenting. 


Also, the indicators correlation matrices of the Netherlands and the EU are compared using the Mantel test. In this 
caser=0.17 andp, ,,, = 0:04. Therefore, there exist statically significant results at 95% level of significance, that the EU 
and the Netherlands may have a weak correlation. However, 7 = 0.17 is a rather small coefficient and commonly displays 
no correlation. For completeness and comparison purposes, the correlation of the matrices related to Greece and the 
Netherlands are examined. Here we get r = 0.07, but we cannot conclude no-correlation between those two, due to the 
low statistical evidence (p,, =0.07). 


value 
Moreover, another metric is utilized to measure the degree of differences in these correlation matrices, the Frobenius 
norm of each matrix, as presented in section 3. We point out that the degree of difference between these correlation 


matrices as measured is the same. Specifically, we have norm, 4, = 19.75, norm,,, y, = 18.42 and norm, y, = 18.94. 


The features with 0.90 or more correlation are removed from our study to prevent multi-collinearity problems. The 
indicators that have been dropped are marked with the symbol 7 in the last two columns in table 1 for the Greece and EU 
cases respectively. After dropping these features we calculate indicator importance as described in section 3. The 
importance of the top indicators of Greece versus the EU may be graphically compared in Figure 6. If we focus only on 
the top-five important features affecting the most the fluctuation of summary innovation value for EU and Greece we 
identify only two that are in common; the venture capital and the Design applications. 
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Figure 4: Correlation heat-map: Indicators-EU 
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Figure 5: Correlation heat-map: Indicators-Greece 


Venture capital plays a significant role in both the Greek and EU innovation performance. In fact, it is the catalyst for 
start-ups and entrepreneurship. It facilitates innovations and enables them to be evolved into marketable products. It 
allows the financing of business ideas that would otherwise have no chance of gaining access to the capital required. In 
addition, designing innovative applications and products plays surely a fundamental role in country’s innovation 
output. 


According to our results, with the exceptions of venture capital and designing applications and products, EU 
average gives a high credit in high level features to increase innovation such as exports of medium and high technology 
products, broadband penetration and public-private copublications. It is meaningful that exports of technology products 
play an important role in innovation output. In addition, research connections and productive collaboration projects 
between researchers from the private sector and researchers from the public sector, resulting in academic publications, 
are significant in the increment of innovation. Actually research is mostly what drives innovation. Also, facilities and 
especially high speed internet and networking consolidate the e-potential of EU. To realize the full e-potential of Europe 
depends on establishing the conditions for the flourishing of electronic commerce and the Internet. Broadband penetration 
plays an interesting role in innovation output. 


Greece seems to rely also significantly on well-educated people and innovative SMEs in order to increase innovation 
output. New doctorate graduates and people 25-34 having completed tertiary education play certainly an important role 
on innovation output. In fact, innovative ideas come mostly from educated people. SMEs in Greece represent 99.9% of 
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the total private sector of the country. Specifically, micro enterprises (1-5 employees and below Imil. revenues) represent 
about 96,6% of the private sector and about 56% of the total employment of the Greek economy. In those terms, SMEs 
are the most significant part of the Greek and European economies, affecting directly both the financial and the social 
aspects of economic life. Innovative SMEs collaborating with other enterprises or institutions are an important indicator 
of Greece’s innovation. The flow of information and knowledge between public research institutions and companies, 
and between companies and other firms, tends to be important for the development of innovation. 


6. Synopsis and concluding remarks 


The presented analysis focus on providing comparisons of Greece and EU average concerning innovation. It assesses 
systematic over-performance or under-performance of Greece compared to EU. It report the trends of indicators, upwards 
(positive trend) or downwards (negative trend), and the importance of each indicator to total innovation output regarding 
Greece and EU. 


Columns “Performance” and “Trend” in Table 1, summarize the innovativeness of Greece compared to EU. Please 
note that every empty cell denotes not enough statistical evidence to decide (statistical insignificant). The missing 
indicators for Greece are the Foreign doctorate students as a % of all doctorate students and Employment in fast- 
growing enterprises (% of total employment). 


Trend-line analysis is commonly used as a forecasting tool. In this direction, Tables 1, 2, provide us with statistical 
evidence to make predictions on innovativeness of Greece about the years to come. In Table 3, we highlight the five most 
important indicators, according to our methodology, affecting the fluctuation of the summary innovation index. 


Our data suggest that Greece should take actions to increase its innovation output by focusing not only on the top- 
five important indicators of Greece level, indicated above. It should also try to follow the model of EU towards the 
increment of indicators highlighted in Table 3. 


The efforts in this paper is based on the utilization of the data from EIS version 2018. Specifically, the normalized 
scores of composite indicators and simple indicators from 2010-2017 of Greece and EU average are selected. Data charts 
of indicators and percentage change each year are presented in the data analysis. Greece with EU average in country- 
level innovativeness is compared by utilizing statistics and hypothesis testing. Over-performance of Greece versus EU 
is found in the composite indicator of Innovators and in simple indicators, namely innovative SMEs collaborating with 
others, international scientific copublications, non-R&D innovation expenditures, percentage population aged 25-34 
having completed tertiary education, sales of new-to-market and new-to-firm innovations and SMEs introducing marketing 
or organizational innovations. 


In addition, the comparative analysis of the linear trend-lines of Greece and EU indicators in a statistical manner, 
utilizing generalized least-squares regression method with auto-regressive errors is provided. Greece shows positive 
significant trend relative to the EU in composite indicators, namely Research systems, Intellectual assets and Employment 
impacts, and in indicators, namely Design applications, Trademark applications, International scientific co-publications, 
PCT patent applications, Percentage population aged 25-34 having completed tertiary education, Percentage population 
aged 25-64 involved in lifelong learning, R&D expenditure in the public sector. For more information on systematic over- 
performance or under-performance of Greece versus EU and trend-line analysis, please view tables | and 2. 


The impact of indicators on the volatility of the summary innovation index is calculated by using four well-known 
classifier models to build and employ a model based feature significance analysis for Greece and the EU. Specifically the 
models are the Logistic Regression, Random Forest, Extra-Trees and Support Vector Machines. Indicator correlation 
analysis provides us with evidence to exclude some indicators from our modeling. Calculated by using four well-known 
classifier models to build and employ a model based feature significance analysis for Greece and the EU. 


The analysis shows that there is no significant association between the correlation matrices of Greece and the EU. 
However, the result is statistical insignificant. Thus, we do not have enough statistical evidence to conclude no 
association between these two. For completion and comparison purposes, we examine the correlation between Greece 
and the Netherlands, and the EU and the Netherlands. The degree of difference between those in terms of indicators 
correlation is to a great extends the same. 


The fact that only relatively limited data instances are available leads us to cross-validate training of models, 
specifically 3-fold cross-validation. The estimate of feature importance of each fold is kept and then average for each 
model. Finally, after transforming values of importance in percentage values for each model the percentage values are 
summarize. For further information on indicator importance of EU and Greece, please view Figure 6, or the more compact 
Table 3. 
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Figure 6: Importance of indicators-Greece vs European Union (EU) 
Table 3: Importance of top-five Indicators for Greece and EU 
Greece EU 
Indicator Importance Indicator Importance 
Design_applications 16% Venture_capital 12% 
Venture_capital 14% Exports_technology 10% 
Percentage_tertiary_edu 12% Broadband_penetration 9% 
New_doctorate_grads 9% Design_applications 8% 
Innovative_Smes 8% Public_private_pubs 8% 


In an effort to further investigate our results, a Principal Component Analysis (PCA) has been carried out for the EU 
and Greek correlation matrices, as proposed in (Jolliffe and Cadima, 2016). It turns out that in both cases the eigenvalues 
are, as expected, non-negative real numbers. Most of them are zero. Then non-zero eigenvalues are plotted in Figure 7. 
Furthermore, all the eigenvectors are real with all of their element in the interval (—1, 1). 


The first two principal components cumulatively characterize about 75% of the variance of the data in each case. 
Figure 8 shows the two biplots of the EU and Greek case. A biplot is a display that attempts to represent both the 


0 2 4 6 8 10 12 14 


Figure 7: Plot of all non zero eigen values for the EU (red) and the Greek (blue) indicators correlation matrices 
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Figure 8: PCA - Biplots of Indicators of the EU and Greece. 
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observations and variables of multivariate data in the same plot and their contribution to the principal components 
(Jolliffe and Cadima, 2016). The axes of the above-mentioned figures are the principal components | and 2. Also, 
positively correlated variables point to the same side of the plot, while negatively correlated variables point to opposite 
sides of the graph. On both graphs, we are able to recognize groups of positive and negative correlated indicators. 


Figure 8 clearly exhibits the fundamentally different behavior of indicator data of Greece versus the EU through the 
years. Clusters can be easily identified in the case of Greece. For example, we can identify four clusters-groups. The first 
one containing data from the years 2010 to 2013, the second one data from 2014 and 2015, and the third and fourth 2016 
and 2017 respectively. On the other side, in the case of the EU, if we want to discriminate four clusters on data, we can 
say that cluster one contains data from 2010 and 2011, cluster two 2012 and 2013, cluster three and four 2014, 2015 and 
2016, 2017 respectively. 


Regardless the above observations, the complete Eigen analysis of our correlation matrices seems to be a well 
justified interesting subject. It is though beyond the scope of this paper. It is foreseen that our research study has the 
potential to provide explanations and evidences to assist the country assess its strengths and weaknesses regarding its 
innovation performance and as an extension its economic growth. By comparisons with EU countries (average), we 
display the position of Greece relative to the EU. 


The center of gravity of the scientific merit of our work is closer to the area of data science rather than to the 
economic analysis. Therefore, a rigorous and systematic analysis of our results at the economic and development level 
is surely needed. Nevertheless, such a study is beyond the scope of this paper. 


Further future work concerns deeper analysis of particular, existing or new, mechanisms of innovation. The utilization 
of data from different and multiple data sources as mentioned in the studies presented in section 2 will further contribute 
to our study. Data sources such as interviews of executive managers or chief executives of enterprises or academics from 
numerous institutions will allow us to examine in depth particular indicators of innovation. Micro data from Eurostat is 
expected to also enhance and clear our vision by adding significant new insight. 


Please note that a similar study at the regional level for Greece, analyzing and comparing the innovation performance 
of regions using the indicators provided by the Regional Innovation Scoreboard database is underway and it will be 
presented elsewhere. 
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Appendix 
In this section, we provide the tables of our analysis and further information. Following, we make available all the 
visualizations and graphs utilized in our analysis in figshare. (https://doi.org/10.6084/m9.figshare.9844907.v1) 
Long acronyms of names of indicators: 
Summary_Innovation_I Summary innovation index. 
Innovation-friendly_env Innovation-friendly environments. 
Broadband_penetration Broadband penetration. 
Venture_capital Venture capital (% of GDP). 
Design_applications Design applications per billion GDP (in PPS). 
Trademark_apps Trademark applications per billion GDP (in PPS). 
Employment_activities Employment in knowledge-intensive activities (% of total employment). 
Enterprises_training Enterprises providing training to develop or upgrade ICT skills of their personnel. 
Innovative_Smes Innovative SMEs collaborating with others (% of SMEs). 
International_publications International scientific co-publications per million population. 
Knowledge_exports Knowledge-intensive services exports as % of total services exports. 
New_doctorate_grads New doctorate graduates per 1000 population aged 25-34. 
Non_rd Non-R&D innovation expenditures (% of turnover). 
Opportunity_enterpre Opportunity-driven entrepreneurship (Motivational index). 
Pct_patent PCT patent applications per billion GDP (in PPS). 
Percentage_tertiary_edu % population aged 25-34 having completed tertiary education. 
Percentage_lifelong_lea % population aged 25-64 involved in lifelong learning. 
Private_co_funding Private co-funding of public R&D expenditures (% of GDP). 
Public_private_pubs Public-private co-publications per million population. 
Rd_business R&D expenditure in the business sector (% of GDP). 
Rd_public R&D expenditure in the public sector (% of GDP). 
Sales Sales of new-to-market and new-to-firm innovations as % of turnover. 
Scientific_pubs Scientific publications among the top 10% most cited publications worldwide as % of total scientifi¢ 
publications of the country. 
Smes_in_house SMEs innovating in-house as % of SMEs. 
Smes_marketing SMEs introducing marketing or organizational innovations as % of SMEs. 


Smes_product SMEs introducing product or process innovations as % of SMEs. 


Exports_technology Exports of medium and high technology products as a share of total product exports. 


Cite this article as: Thanasis Zoumpexas, M anolis Vavalis, Elias H oustis (2021). Analysis of innovation with 
data science: The case of Greece. International Journal of Data Science and Big Data Analytics. 1(1), 20-42. 
doi: 10.51483/JDSBDA. 1.1.2021.20-42. 


